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ABSTRACT 

Distributed Denial of Service (DDoS) flooding attacks are 
one of the biggest challenges to the availability of online ser¬ 
vices today. These DDoS attacks overwhelm the victim with 
huge volume of traffic and render it incapable of perform¬ 
ing normal communication or crashes it completely. If there 
are delays in detecting the flooding attacks, nothing much 
can be done except to manually disconnect the victim and 
fix the problem. With the rapid increase of DDoS volume 
and frequency, the current DDoS detection technologies are 
challenged to deal with huge attack volume in reasonable 
and affordable response time. 

In this paper, we propose HADEC, a Hadoop based Live 
DDoS Detection framework to tackle efficient analysis of 
flooding attacks by harnessing MapReduce and HDFS. We 
implemented a counter-based DDoS detection algorithm for 
four major flooding attacks (TCP-SYN, HTTP GET, UDP 
and ICMP) in MapReduce, consisting of map and reduce 
functions. We deployed a testbed to evaluate the perfor¬ 
mance of HADEC framework for live DDoS detection. Based 
on the experiment we showed that HADEC is capable of pro¬ 
cessing and detecting DDoS attacks in affordable time. 
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1. INTRODUCTION 

Distributed Denial of Service (DDoS) flooding at¬ 
tacks are one of the biggest concerns for security and 
network professionals. The first DDoS attack incident 
15 was reported in 1999 by the Computer Incident Ad¬ 
visory Capability (CIAC). Since then, most of the DoS 
attacks are distributed in nature and they continue to 
grow in frequency, sophistication and bandwidth. The 
main aim of these attacks is to overload the victim’s 
machine and make his services unavailable, leading to 
revenue losses. 

Over the years DDoS has hit major companies and 
Internet infrastructures, incurring significant loss in rev¬ 
enues. Yahoo! experienced one of the first major DDoS 
flooding attacks that made their services offline for about 


2 hours 12 . In October 2002, 9 of the 13 DNS root 


servers were shut down for an hour because of a DDoS 
flooding attack [8]. During the fourth quarter of 2010, a 
hacktivist group called Anonymous orchestrated major 
DDoS flooding attacks and brought down the Master¬ 
card, PostFinance, and Visa websites 7 . Most recently, 
online banking sites of 9 major U.S. banks (i.e., Bank of 
America, Citigroup, Wells Fargo, U.S. Bancorp, PNC, 
Capital One, Fifth Third Bank, BB&T, and HSBC) 
have been continuously the targets of powerful DDoS 
flooding attack series [15]. The legacy of DDoS con¬ 
tinue to grow in sophistication and volume with recent 


attacks breaking the barrier of 100 Gbps 32 


The explosive increase in the volume of internet traffic 
and sophistication of DDoS attacks have posed serious 
challenges on how to analyze the DDoS attacks in a 
scalable and accurate manner. For example, two of the 
most popular open-source intrusion detection systems 
(IDS), Snort 27 and Bro 25 , maintain per-flow state 
to detect anomalies. The Internet traffic doubles every 
year and due to that monitoring large amount of traffic 
in real-time anomaly detection with conventional IDS 
has become a bottleneck. 

In 20 , Lee et al. has proposed a DDoS detection 
method based on Hadoop l]. They have used a Hadoop 
based packet processor 19 and devised a MapReduce 
[5] based detection algorithm against the HTTP GET 
flooding attack. They employ a counter-based DDoS 
detection algorithm in MapReduce that counts the to¬ 
tal traffic volume or the number of web page requests 
for picking out attackers from the clients. For experi¬ 
ments, they used multiple Hadoop nodes (max. 10) in 
parallel to show the performance gains for DDoS detec¬ 
tion. Unfortunately, their proposed framework, in its 
current form can only be used for offline batch process¬ 
ing of huge volume of traces. The problem to develop a 
real time defense system for live analysis still needs to 
be tackled. 

In this paper, we propose HADEC, a Hadoop based 
Live DDoS Detection framework. HADEC is a novel 
destination based DDoS defense mechanism that lever¬ 
ages Hadoop to detect live DDoS flooding attacks in 
wired networked systems. HADEC comprise of two 
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main components, a capturing server and a detection 
server. Live DDoS starts with the capturing of live net¬ 
work traffic handled by the capturing server. The cap¬ 
turing server then process the captured traffic to gen¬ 
erate log file and transfer them to the detection server 
for further processing. The detection server manages 
a Hadoop cluster and on the receipt of the log ffie(s), 
it starts a MapReduce based DDoS detection jobs on 
the cluster nodes. The proposed framework implements 
counter-based algorithm to detect four major DDoS 
flooding attacks (TCP-SYN, UDP, ICMP and HTTP 
GET). These algorithms executes as a reducer job on 
the Hadoop detection cluster. 

We also deploy a testbed for HADEC which con¬ 
sists of a capturing server, detection server and a clus¬ 
ter of ten physical machines, each connected via a Gi¬ 
gabit LAN. We evaluate HADEC framework for live 
DDoS detection by varying the attack volume and clus¬ 
ter nodes. HADEC is capable of analyzing 20 GB of 
log file, generated from 300 GBs of attack traffic, in ap¬ 
prox. 8.35 mins on a cluster of 10 nodes. For small log 
files representing 1.8 Gbps the overall detection time is 
approx. 21 seconds. 

The rest of the paper is organized as follows. ^2] de¬ 
scribes the state of the art. ^3] describes the HADEC 
framework design. In Q we discuss the testbed and 
demonstrate the performance of the proposed frame¬ 
work. Finally we conclude the paper in ^5j 


2. RELATED WORK 


Since the inception of DDoS flooding attacks, sev¬ 
eral defense mechanisms have been proposed to date in 


the literature 32 . This section highlights the defense 
mechanisms against two main DDoS flooding attacks, 
followed by a discussion on the application of Mapre- 
duce/Hadoop to combat network anomalies, Botnet and 
DDoS related attacks. 

The DDoS flooding attacks can be categorized into 
two types based on the protocol level that is targeted: 
network/transport-level attacks (UDP flood, ICMP flood, 
DNS flood, TCP SYN flood, etc.) and application- 
level attacks (HTTP GET/POST request). The defense 
mechanisms against network/transport-level DDoS flood¬ 
ing attacks roughly falls into four categories: source- 
based, destination-based, network-based, and hybrid (dis¬ 
tributed) and the defense mechanisms against application- 
level DDoS flooding attacks have two main categories: 
destination-based, and hybrid (distributed). Since the 
application traffic is not accessible at the layer 2 and 
layer 3, there is no network-based defense mechanism 
for the application-level DDoS. Following is the sum¬ 
mary of features and limitations for the DDoS defense 
categories. 

• Source-Based: In source-based defense mecha¬ 
nism the detection and response are deployed at 


the source hosts in an attempt to mitigate the 
attack before it wastes lots of resources 21,22 


Accuracy is a major concern in this approach as 
it is difficult to differentiate legitimate and DDoS 
attack traffic at the sources with low volume of 
the traffic. Further there is low motivation for de¬ 
ployment at the source ISP due to added cost for 
community service. 

• Destination-Based: In this case the detection 
and response mechanisms are deployed at the des¬ 
tination hosts. Access to the aggregate traffic near 
the destination hosts makes the detection of DDoS 
attack easier and cheaper, with high accuracy, than 
other mechanisms 26 28, 29 . On the downside 


destination based mechanisms cannot preempt a 
response to the attack before it reaches the victim 
and wastes resources on the paths to the victim. 

• Network-Based: With network-based approach 
the detection and response are deployed at the in¬ 
termediate networks (i.e., routers). The rational 
behind this approach is to filter the attack traf¬ 
fic at the intermediate networks and as close to 
source as possible 


23 24 . Network-based DDoS 


defenses incur high storage and processing over¬ 
head at the routers and accurate attack detection 
is also difficult due to lack of sufficient aggregated 
traffic destined for the victims. 


• Hybrid (Distributed): In hybrid approach there 
is coordination among different network compo¬ 
nents along the attack path and detection and 
response mechanisms are deployed at various lo¬ 
cations. Destination hosts and intermediate net¬ 
works usually deploy detection mechanisms and 
response usually occurs at the sources and the 
upstream routers near the sources 30, 31 . Hy¬ 


brid approach is more robust against DDoS at¬ 
tacks, but due to distributed nature, it requires 
more resources at various levels (e.g., destination, 
source, and network) to tackle DDoS attacks. The 
complexity and overhead because of the coordina¬ 
tion and communication among distributed com¬ 
ponents is also a limiting factor is smooth deploy¬ 
ment of hybrid-based DDoS defenses. 


Analysis of logs and network flows for anomaly detec¬ 
tion has been a problem in the information security for 
decades. New big data technologies, such as Hadoop, 
has attracted the interest of the security community for 
its promised ability to analyze and correlate security- 
related heterogeneous data efficiently and at unprece¬ 
dented scale and speeds 13 . In the rest of the section, 

dis- 


we review some recent techniques (other than 20 
cussed in 0 where Hadoop based frameworks are used 
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to build affordable infrastructures for security applica¬ 
tions. 

BotCloud 17 propose a scalable P2P detection mech¬ 
anism based on MapReduce and combination of host 
and network approaches 18 . First they generate large 


dataset of Netflow data 0 on an individual operator. 
Next they applied a PageRank algorithm on the Net- 
flow traces to differentiate the dependency of hosts con¬ 
nected in P2P fashion for the detection of botnets. They 
moved the pagerank algorithm to MapReduce and the 
pagerank algorithm executes on data nodes of Hadoop 
cluster for efficient execution. 

Temporal and spatial traffic structures are essential 
for anomaly detectors to accurately drive the statistics 
from network traffic. Hadoop divides the data into mul¬ 
tiple same size blocks, and distributes them in a clus¬ 
ter of data nodes to be processed independently. This 
could introduce a difficulty in analysis of network traf¬ 
fic where related packets may be spread across different 
block, thus dislocating traffic structures. Hashdoop 16 


resolve this potential weakness by using hash function 
to divide traffic into blocks that preserve the spatial 
and temporal traffic structures. In this way, Hashdoop 
conserves all the advantages of the MapReduce model 
for accurate and efficient anomaly detection of network 
traffic. 



Detection Server 


Figure 1: Different Phases of HADEC 


3. HADOOP DDOS DETECTION FRAME¬ 
WORK 


Each of the above mentioned phases are implemented 
as separate components that communicate with each 
other to perform their assigned task. Traffic capturing 
and log generation are handled at the capturing server , 
whereas DDoS detection and result notification is per¬ 
formed by the detection server. Log transfer is han¬ 
dled through web services. In the following subsections 
we have explained the functionalities for each of the 
phase/component in detail. 


3.1 Traffic Capturing and Log Generation 

live DDoS detection starts with the capturing of net¬ 
work traffic. HADEC provides a web interface through 
which the admin can tune the capturing server with de¬ 
sired parameters. These parameters are; file size, num¬ 
ber of files to be captured before initializing the detec¬ 
tion phase and the path to save the captured file. Once 
the admin is done with the configurations, the Traffic 
Handler sends the property file to the Echo Class (a 
java utility to generate logs) and start the capturing of 
live network traffic (see fig. [2]) . 


HADEC use the Tshark library 11 to capture live 


network traffic. Tshark is an open source library capa¬ 
ble of capturing huge amount of traffic. Under default 
settings, Tshark library runs through command line, 
and outputs the result on console. To log the traffic for 
later use, we developed a java based utility (Echo Class) 
to create a pipeline with Tshark and read all the out¬ 
put packets from Tshark. We have also tuned Tshark 
to output only the relevant information required during 
detection phase. This includes information of times¬ 
tamps, src IP, dst IP, packet protocol and brief packet 
header information. Following are the snippets for TCP 
(SYN), HTTP, UDP and ICMP packets that are logged 
in the file. 


TCP (SYN) 

17956 45.406170 10.12.32.1 -> 10.12.32.101 

TCP 119 [TCP Retransmission] 0 > 480 [SYN] 
Seq=0 Win=10000 Len=43 MSS=1452 SACK_PERM=1 
TSval=422940867 TSecr=0 WS=32 

HTTP 

46737 2641.808087 10.12.32.1 -> 10.12.32.101 
HTTP 653 GET /posts/17076163/ivc/dddc? 
.=1432840178190 HTTP/1.1 


The Hadoop Based Live DDoS Detection Framework 
(HADEC) comprise of four major phases (see fig. [l]) . 

1. Network traffic capturing and Log generation. 

2. Log transfer. 

3. DDoS detection. 

4. Result notification. 


UDP 

139875 138.04015 10.12.32.1 -> 10.12.32.101 
UDP 50 Src port: 55348 Dst port: http 

ICMP 

229883 2658.8827 10.12.32.1 -> 10.12.32.101 

ICMP 42 Echo (ping) request id=0x0001, 
seq=l1157/38187, ttl=63 (reply in 229884) 
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As discussed above, the Traffic Handler sends the 
property file to the Echo Class with the desired set of 
parameters (file size, file count for detection and storage 
path on the capturing server) set by the admin. Echo 
Class use these parameters to generate a log file, at the 
specified location, when it reads the required amount of 
data from Tshark. Once the log file is generated, the 
Echo Class also notifies the Traffic Handler (see fig. [ 2 ]) . 


Admin 


1 1. Traffic Capturing 
X Parameters 
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Figure 2: Network Traffic Capturing and Log 
Generation Component 


3.2 Log Transfer Phase 

After the log file is generated, the Traffic Handler 
in the capturing server will temporarily pause the traf¬ 
fic capturing operations of Tshark. The traffic han¬ 
dler will then notify the detection server and also share 
the file information (file name, file path, server name, 
etc.) with it via a webservice. The detection server 
will initiate a Secure Copy or SCP protocol |10|(with 
pre-configured credentials) with the capturing server, 
and transfer the log file from the capturing server (us¬ 
ing the already shared name/path information) into its 
local file system (see fig. [3| . 

Since the detection server mainly works as a Na- 
meNode i.e. the centerpiece of the Hadoop cluster and 
HDFS (Hadoop distributed file system), it has to trans¬ 
fer the log file(s) from local storage to HDFS. On suc¬ 
cessful transfer of log file into HDFS, the detection server 
sends a positive acknowledgement to the capturing server 
and both the servers delete that specific file from their 
local storage to maintain healthy storage capacity. On 


the receipt of successful log file transfer, the traffic han¬ 
dler will restart the Tshark for capturing network traf¬ 
fic. Before starting the DDoS detection process, the 
detection server will wait for the final acknowledgment 
from the capturing server. This acknowledgement vali¬ 
dates that the desired number of files of a particular size 
(set via parameters by admin) has been transferred to 
HDFS before the execution of MapReduce based DDoS 
detection algorithm. There is no particular restriction 
on the minimum file count before the detection starts; 
it could be set to one. 
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Figure 3: Log Transfer Phase 


3.3 Detection Phase 

The Apache Hadoop consists of two core components 
i.e. HDFS (storage part) and MapReduce (processing 
part). Hadoop’s central management node also known 
as NameNode splits the data into same size large blocks 
and distributes them amongst the cluster nodes (data 
nodes). Hadoop MapReduce transfers packaged code 
for nodes to process in parallel, the data each node is 
responsible to process. 

In HADEC, the detection server mainly serves as the 
Hadoop’s NameNode, which is the centerpiece of the 
Hadoop DDoS detection cluster. On successful transfer 
of log file(s), the detection server split the file into same 
size blocks and starts MapReduce DDoS detection jobs 
on cluster nodes (see fig.[4|. We have discussed MapRe¬ 
duce job analyzer and counter based DDoS detection 
algorithm in §3.5| Once the detection task is finished, 
the results are saved into HDFS. 
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Figure 4: DDoS Detection on Hadoop Cluster 


3.4 Result Notification 

Once the execution of all the MapReduce tasks is fin¬ 
ished, Hadoop will save the results in HDFS. The de- 
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tection server will then parse the result file from HDFS 
and send the information about the attackers back to 
the administrator via the capturing server. Once the 
results are notified both the input and output folders 
from HDFS will be deleted for better memory manage¬ 
ment by the detection server. Fig. [5] presents a holistic 
illustration of HADEC framework. 


if packet does not contain information 
then 

ignore that packet 
else 

produce one output record (src ip, packet) 
end if 

end function 


3.5 MapReduce Job and DDoS Detection 

A MapReduce program is composed of a Map task 
that performs filtering and sorting and a Reduce task 
that performs a summary operation. Here we have 
explained how HADEC has implemented detection of 
DDoS flooding attacks (UDP, HTTP GET, ICMP and 
TCP-SYN) as a MapReduce task on Hadoop cluster 
using counter-based algorithm. 


3.5.1 HADEC Mapper job 

After starting MapReduce task, the first task is a 
mapper task which takes input from HDFS as a block. 
In our case the block will represent a file in text format 
and the input for each iteration of mapper function will 
be a single line from the file. Any single line in the 
file contains only brief information of a network packet 
captured through Tshark (see [3.1). The term network 
packet used in the rest of this section represents a single 
line content of the file read as a mapper input. 

Mapper job takes pair of data as input and returns 
a list of pairs (key, value). Mapper output type may 
differ from mapper’s input type, in our case the input of 
mapper is pair of any number i and the network packet. 
The output is a list of pair (key, value) with key as the 
src IP address and value as a network packet. Mapper 
job also use hashing for combining all the logs of data 
on the basis of src IP address, so that it becomes easier 
for reducer to analyze the attack traffic. 

After all the mapper have finished their jobs, the data 
or worker nodes perform a shuffle step. During shuffling 
the nodes redistribute the data based on the output 
keys, such that all data belonging to one key is located 
on the same worker node (see fig. |6|. 

In HADEC, for analysis and detection of UDP flood¬ 
ing attack the mapper task filters out the packets having 
UDP information. In particular, the mapper function 
will search packets having QUIC / UDP information. 
QUIC stands for Quick UDP Internet connection. For 
the packet that contains the desired information, the 
mapper function generates an output in the form of 
pairs (key, value). The pseudocode for mapper func¬ 
tion is as follows. 


°/ 0 UDP detection mapper function 
function Map is 

input: integer i, a network packet 
begin function 

filter packet with QUIC/UDP type 


For ICMP, TCP-SYN and HTTP-GET based flood¬ 
ing attacks; the mapper function will search for SYN, 
ICMP and HTTP-GET packet type information respec¬ 
tively. 

3.5.2 HADEC Reducer job and Counter-Based Al¬ 
gorithm 

Once the mapper tasks are completed, the reducer 
will start operating on the list of key/value pairs (i.e. 
IP/Packet pairs) produced by the mapper functions. 

The reducers are assigned a group with unique key, it 
means that all the packets with unique key (unique src 
IP in our case) will be assigned to one reducer. We can 
configure Hadoop to run reducer jobs on varying num¬ 
ber of data nodes. For efficiency and performance it 
is very important to identify the correct number of re¬ 
ducers required for finalizing the analysis job. HADEC 
run counter-based algorithm to detect DDoS flooding 
attacks on reducer nodes. The reducer function takes 
input in key/value pair (srp IP, Packet of Type X) and 
produces a single key/value pair (src IP, No. of packets 
of type X) output after counting the number instance 
(see fig. [6|. 

Counter based algorithm is the simplest, yet very ef¬ 
fective algorithm to analyze the DDoS flooding attacks 
by monitoring the traffic volumes for src IP addresses. 

The algorithm counts all the incoming packets, of a 
particular type (UDP, ICMP, HTTP ...etc), associated 
with a unique IP address in a unit time. If the traf¬ 
fic volume or count for src IP exceeds the pre-defined 
threshold, that particular IP will be declared as an 
attacker. The pseudocode for reducer function using 
counter-based algorithm for UDP attack is as follows. 

/* °/ 0 Reducer function for UDP attack detection */ 
function Reduce is 

input: <source ip, UDP Packets> 
begin function 

count :=count # of packets for src IP 
if(count is greater than THRESHOLD) 
begin if 

/* This ip declares to be the Attacker ip */ 
produce one ouput <Src IP, # of Packets> 
end if 
else 

Ignore (do nothing) 
end function 

4. TESTBED AND EVALUATIONS 
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Figure 5: HADEC: Hadoop Based DDoS Detection FrameWork 
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Figure 6: Mapper and Reducer Operations 


In this section we have discussed the testbed deploy¬ 
ment of HADEC and how we have evaluated the per¬ 
formance of the proposed framework with different ex¬ 
periment. 

4.1 HADEC TestBed 

HADEC perform two main tasks, (a) capturing and 
transfer of network traffic and (b) detection of DDoS 
flooding attacks. For capturing the traffic we use a sin¬ 
gle node capturing server to capture, process and send 
the network traffic to detection server. For DDoS de¬ 
tection, we deploy a single node detection server (also 
acts as NameNode of Hadoop cluster) and a Hadoop 
detection cluster consisting of ten nodes. Each node in 
our testbed (one capturing server, one detection server 
and ten Hadoop data nodes) consists of 2.60 GHz Intel 
core i5 CPU, 8 GB RAM, 500 GB HDD and 1 Gbps 
Ethernet card. All the nodes in HADEC used Ubuntu 
14.04 and are connected over a Gigabit LAN. We have 
used Hadoop version 2.6.0 for our cluster and YARN 2 


to handle all the JobTracker and TaskTracker function¬ 
ality. 

There are several attack generation tools that are 
available online, such as LOIC [I], Scapy (9|, Mausezahn 
[6], Iperf [3], etc. For our testbed evaluations we have 
mainly used Mausezahn, because of its ability to gener¬ 
ate huge amount of traffic with random IPs to emulate 
different number of attackers. We deployed three ded¬ 
icated attacker nodes along with couple of legitimate 
users to flood the victim machine (capturing server) 
with a traffic volume of uptil 913Mbps (practically high¬ 
est possible for a Gigabit LAN). HADEC testbed is 
shown in fig. [7| For evaluations we have only focused 
on UDP flooding attack due to its tendency to reach 
high volume from limited number of hosts. We would 
also like to add that for all the evaluations we have used 
only a single reducer, different variations were tried but 
but there was no performance gains. 

4.2 Performance Evaluation 

The overall performance of HADEC depends on the 
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Figure 7: HADEC Testbed 


time taken for capturing the log file(s) at the captur¬ 
ing server, transferring the file to the detection server 
and execution of counter-based DDoS detection algo¬ 
rithm on the Hadoop cluster. For our evaluations, we 
varied different parameters like; log file size, Hadoop 
cluster size, Hadoop splits or block size and threshold 
for counter-based algorithm, and measured their impact 
on the performance of HADEC. 

4.2.1 Traffic Capturing and File Transfer 

The capturing server work on two major tasks simul¬ 
taneously. First, it captures huge amount of network 
traffic (913 Mbps in our testbed) and transform it into 
log file(s). Second it transfers the log file(s) to detection 
server for further processing. This simultaneous execu¬ 
tion of capture and transfer operations are important 
for live analysis of DDoS flooding attack, but on the 
hand both the operations compete for resources. 

f^. m shows the capturing and transfer time taken 
by the capturing server for log files of different sizes. 
The capturing time is almost linear to the increase in 
file size. It takes approx 2 seconds to log a file of 10 MB 
and extends to 142 seconds for 1 GB file. File transfer 
takes 14 seconds to transfer 10 MB file and approx. 35 
seconds for 1GB file. This shows a clear improvement 
in throughput with the increase in file size. Here it is 
also interesting to note that the transfer operation has 
to compete for bandwidth and during peak time more 
than 90% of the bandwidth is being consumed by the 
attack traffic. 

4.2.2 Number of Attackers and Attack Volume 

Table, [l] presents the relationship between the size of 
log file with the total number of attackers and aggregate 
traffic volume. HADEC use counter-based algorithm 



Figure 8: Capture and transfer time of a log file. 


Table 1: Relationship of Log File Size with No. 
of Attackers and Traffic Volume 


File Size (MB) 

No. Of Attackers 

Traffic Vol. 

10 

100 

0.22 GB 

50 

500 

0.67 GB 

100 

1500 

1.67 GB 

200 

2000 

3.23 GB 

400 

4000 

5.91 GB 

600 

6000 

9.14 GB 

800 

8000 

12.37 GB 

1000 

10,000 

15.83 GB 


to detect attackers. This means that during the DDoS 
flooding attack, any particular attacker has to cross cer¬ 
tain volume threshold to be detected. According to the 
table, [l] the capturing server has to analyze approx. 
0.24 GBs of network traffic to generate a log file of 10 
MB and it could represent 100 plus attacker that cross 
the flooding frequency threshold of 500-1000 packet. By 
increasing the log file size, the capability to capture ac¬ 
curate information related to attackers also increases. 
There is a trade-off between the log file size and overall 
detection rate, therefore, the admin will have to adjust 
the framework parameters that will best fit in different 
attack scenarios. 

4.2.3 DDoS Detection on Hadoop Cluster 

We evaluate the performance of DDoS detection phase 
on the basis of different size of the log files, different 
data block size for MapReduce tasks, different thresh¬ 
old value for counter-based detection algorithm. For our 
evaluations we used one fix 80-20 attack volume (80% 
attack traffic and 20% legitimate traffic). We have used 
these setting to emulate flooding behavior where attack 
traffic surpass the legitimate one. 

Fig. |9] shows the detection time on Hadoop cluster. 
In this experiment we used a fix threshold of 500 and 
data block of 128 MB. Detection is performed based on 
different file size and varying number of cluster nodes. 
With the increase in file size the number of attack traf- 
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fie also increases, which affects the mapper and reducer 
operation frequency and time. In short with the in¬ 
crease in file size the detection time increase and it will 
also increase the detection rate or the number of attack¬ 
ers IPs, which is a plus point. Increase in cluster size 
hardly effects the detection time for files less 400 MB in 
size, on the contrary in some cases it might increase a 
little due to added management cost. Hadoop enables 
parallelism by splitting the files into different blocks of 
specified size. Files smaller than the Hadoop block size 
are not split over multiple nodes for execution. There¬ 
fore, the overall detection time remains the same over 
different cluster node. 

Starting from the file size of 400 MB, the detection 
time improves with the increase of cluster size. For big¬ 
ger files like 800 MB and 1000 MB, Hadoop work more 
efficiently. We can see that the detection time reduces 
around 27 to 30 for 800 and 1000 MB files respectively, 
when the cluster size is increased from 2 to 10 nodes. 
This is because with 1000MB file there are 9 blocks and 
with the increase in cluster size, Hadoop will assign the 
task to different nodes in parallel. 

90 
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Figure 9: Detection time at Hadoop cluster with 
500 threshold 

Fig. [lO] shows the detection time on Hadoop cluster 
with a threshold value of 1000. In this experiment we 
only change the threshold value and all the remaining 
settings are similar to the fig. [9] With the increase in 
threshold value the total number of inputs for reducers 
also increases and this will increase the reducer time. 
This is the reason why majority of results in shown in 
fig. [lO] has couple of seconds higher detection time as 


compared to the results in fig. [9} 
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-•-10 MB File 22.364 22237 21.509 22.01 22.1 

-»-50 MB File 24.937 24.528 24.58 1 23.483 23.641 

-*-100 MB File 27.49 27.462 2652 26.4 2639 
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Figure 10: Detection time at Hadoop cluster 
with 1000 threshold 

Fig. [Tl] shows the effect of varying block sizes on 
the detection time for 1 GB file. In this experiment we 
use fix threshold of 500 and use three different blocks 
of size 32, 64 and 128 MB. For 1 GB file the block 
size of 128 MB gives the maximum performance gains 
in terms of detection time with the increase in cluster 
nodes. With smaller block size there are more splits, 
resulting in multiple tasks being schedule on a mapper 
and adds management overhead. 
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Figure 11: Detection time with different block 
sizes on 1 GB file 

The effect of cluster size is prominent on large files. 
This is because with large files, Hadoop can effectively 
split the files in multiple blocks and distributed on the 
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available cluster nodes. Fig. 12 and 13 shows the effect 
of different block size and cluster node on the detection 
time, with a fix threshold of 500 and 80-20 attack vol¬ 
ume. 128 MB block size gives the most efficient results; 
this is because when the number of blocks increases the 
resource manager in Hadoop needs to manage each of 
the blocks and its result. Thus, it will take more time 
to manage each task. For larger block size there is only 
one map task to process the whole large block. On a 10 
GB file with a block size of 128 MB, Hadoop finished 
the detection task in approx. 7.5 mins with a cluster 
size of 2 nodes. The detection time goes down to ap¬ 
prox 4.5 mins when the cluster size is increased to 10 
nodes. For 20 GB file with a block size of 128 MB, the 
time to finish the detection task is 14.6 mins and 8.3 
mins on a cluster of 2 and 10 nodes respectively. If we 
approximate the numbers in table, [l] HADEC can ef¬ 
fectively resolve 100K attackers for an aggregate traffic 
volume of 159 GBs with 10 GB of log file in just 4.5 
mins. These numbers will be doubled for 20 GB. 



Figure 12: Effect of block size on 10 GB file 
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Figure 13: Effect of block size on 20 GB file 

4.2.4 Overall Framework Performance 

Fig. [14] and [15] shows the overall performance of 
our proposed framework to detect the DDoS attacks. 
These numbers present the total time required for cap¬ 


turing, processing, transferring and detection with dif¬ 
ferent file sizes. For the experiments in fig. [l4]we have 
used 80-20 attack volume, 128 MB block size and 500 
threshold. For the experiments in fig. [l5j we have only 
changed the threshold to 1000. In fig. [l4j we can ob¬ 
serve that with the increase in the file size, the overall 
overhead of capturing and transferring phase increase. 
A 10 MB file takes approx. 16 seconds (42%) in cap- 
turing/tranferring phase and 21 seconds in detection 
phase. The best case of 1 GB file (10 node cluster) 
takes 178 seconds (77%) in capturing/tranferring phase 
and just 50 seconds in detection phase. On the whole, 
it takes somewhere between 4.3 mins to 3.82 mins to 
analyze 1 GB of log file that can resolve 10K attackers 
and generated from an aggregate attack volume of 15.83 
GBs. 
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Figure 14: Total time to detection DDoS Attack 
in HADEC with 500 threshold 
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Figure 15: Total time to detection DDoS Attack 
in HADEC with 1000 threshold 
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4.3 Discussion 

Based on the results presented in this section, we can 
conclude that HADEC is capable of analyzing huge vol¬ 
ume of DDoS flooding attacks in scalable manner. Sev¬ 
eral GBs (1 GB file generated from 15.83 GBs of live 
traffic) of attack traffic can be analyzed in less than 5 
mins. By using small size for log file the overall detec¬ 
tion time can be reduced to couple of seconds (30-40 
seconds). But small log files also have an inherent limi¬ 
tation to the number of attacker’s they can track. There 
is no single recommended setting; the admin will have 
to tweak the framework configuration that best match 
their requirement. 

We also noticed that with smaller files, Hadoop does 
not provide parallelism. This means that if any ad¬ 
min configures HADEC to work on small files of under 
800 MB, there will be no point in setting up multiple 
node cluster. A single or two node cluster of Hadoop 
will do the job within few minutes (2-3) with the hard¬ 
ware settings we used in our testbed. In our evaluations 
of HADEC, capturing and transferring phase showed 
the performance overhead and majority of the frame¬ 
work time was spent in these phases. This problem 
could be easily resolved by using reasonable to high- 
end server optimized for traffic operations, instead of 
mid-level core i5 desktop that are used in our testbed. 

5. CONCLUSIONS 

In this paper, we present HADEC, a scalable Hadoop 
based Live DDoS Detection framework that is capable 
of analyzing DDoS attacks in affordable time. HADEC 
captures live network traffic, process it to log relevant 
information in brief form and use MapReduce and HDFS 
to run detection algorithm for DDoS flooding attacks. 
HADEC solve the scalability, memory inefficiency and 
process complexity issues of conventional solution by 
utilizing parallel data processing promised by Hadoop. 
The evaluation results showed that HADEC would less 
than 5 mins to process (from capturing to detecting) 1 
GB of log file, generated from approx. 15.83 GBs of live 
network traffic. With small log file the overall detection 
time can be further reduced to couple seconds. 

We have observed that capturing of live network traf¬ 
fic incur the real performance overhead for HADEC. 
In worse case the capturing phase consumes 77% of 
the overall detection time. As a future work, HADEC 
framework may allow potential optimizations to im¬ 
prove the capturing efficiency. 
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